18.05 Lecture 19 
March 28, 2005 



Covariance and Correlation 

Consider 2 random variables X, Y 
<7 2 =Var(X),<7 2 = Var(Y) 
Definition 1: 

Covariance of X and Y is defined as: 



Cov(Y, Y) = E(X - EX)(Y - EY) 



Positive when both high or low in deviation. 
Definition 2: 

Correlation of X and Y is defined as: 



P{ X,Y) = ^X)- Cov(Y,Y) 



U X Uy 

The scaling is thus removed from the covariance. 



V / Var(Y)Var(Y) 



Cov(Y, Y) = E(XY - XEY - YEX + EXEY) = 

= E(XY) - EXEY - EYEX + EXEY = E(XY) - EXEY 



Cov(Y, Y) = E(XY) - EXEY 

Property 1: 

If the variables are independent, Cov(Y, Y) = (not correlated) 
Cov(Y, Y) = E{XY) - EXEY = EXEY - EXEY = 

Example: X takes values {—1,0, 1} with equal probabilities {|, |, \} 
Y = X 2 

X and Y are dependent, but they are uncorrelated. 
Cov(Y, Y) = EX 3 - EXEX 2 
but, EY = 0, and EX 3 = EX = 
Covariance is 0, but they are still dependent. 
Also - Correlation is always between -1 and 1. 

Cauchy- Schwartz Inequality: 

(Ely) 2 < EX 2 EY 2 

Also known as the dot-product inequality: 
To prove for expectations: 

4>{t) = E(tX + Y) 2 = t 2 EX 2 + 2tEXY + EY 2 > 

Quadratic f(t), parabola always non- negative if no roots: 

D = (Ely) 2 - EY 2 EY 2 < 0) (discriminant) 

Equality is possible if 4>(t) = for some point t. 

4>{t) = E(tX + Y) 2 = 0, if tX + Y = 0, Y = -tX, linear dependence. 

(Cov(Y, Y)) 2 = (E(X - EX)(Y - EY)) 2 < E(X - EY) 2 E(Y - EY) 2 = cr 2 

|Cov(Y,Y)| < a x a y , 
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(J X Oy 

So, the correlation is between -1 and 1. 



Property 2: 



-l<p{X,Y)<\ 



When is the correlation equal to 1, -1? 

\p(X, Y)\ = l only when Y - EY = c(X - EX), 

or Y = aX + b for some constants a, b. 

(Occurs when your data points are in a straight line.) 



If Y = aX + b : 




E(aX 2 + bX) - EXE(aX + b) _ aVar(X) 



— = sign(a) 



v /Var(X) x a 2 Var(Y) |a|Var(X) \a\ 



If a is positive, then the correlation = 1, X and Y are completely positively correlated. 
If a is negative, then correlation = -1, X and Y are completely negatively correlated. 



Looking at the distribution of points on Y = X 2 , there is NO linear dependence, correlation = 0. 
However, if Y = X 2 + cX, then there is some linear dependence introduced in the skewed graph. 



Var(Y + Y)= E(X I Y EX EY) 2 = E((X - EX) + (Y - EY)) 2 = 
E(X - EX) 2 - 2E(X - EX)(E(Y - EY) + E(Y - EY) 2 = Var(Y) + Var(F) - 2Cov(Y, Y) 



Conditional Expectation: 
(X, Y) - random pair. 

What is the average value of Y given that you know X? 

f(x, y) - joint p.d.f. or p.f. then f(y\x) - conditional p.d.f. or p.f. 

Conditional expectation: 



E(y|Y) = h(X) = J yf(y\X)dy - function of X, still a random variable. 
Property 4: 




Property 3: 




E(E(Y\X)) = EY 
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Proof: 

E(E(Y\X)) = E(h(X)) = J f(x)f(x)dx = 

= 1(1 yf(y\x)dy)f(x)dx = J J yf(y\x)f(x)dydx = J J yf(x, y)dydx = 
= J y(J f(x, y)dx)dy = J yf(y)dy = EY 

Property 5: 

E(a(X)Y\X) = a(X)E(Y\X) 

See text for proof. 

Summary of Common Distributions: 

1) Bernoulli Distribution: B(p),p <G [0, 1] - parameter 

Possible values of the random variable: X = {0, 1}; f(x) = p x (l — p) 1 ^ 

P(l) =p,P(0) = l-p 

E(X)=p,Vai(X)=p(l-p) 

2) Binomial Distribution: B(n,p), n repetitions of Bernoulli 

X — {0, 1, n}; f(x) = — p) 1 ~ x 

E(X) = np, Var(X) = np{l - p) 

3) Exponential Distribution: E(a), parameter a > 

X = [0,oo), p.d.f. f(x) = {ae~ ax ,x> 0; 0, otherwise } 

i u 

EX = -,EX k = -r 

a a k 

2 1 1 

Var(X) = - = 

cr cr a. 1 

** End of Lecture 19 
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